Solve exercise 1 in the lecture notes.
In order to apply logistic regression we need to know how to optimize functions - in our case the logistic regression loss (3.11) in the lecture notes. If you already have experience in optimization you may not need the following two assignments.
a) Calculate the gradients of the following functions
$$f(x, y) = \frac{1}{x^2+y^2}$$and $$f(x, y) = x^2y.$$
b) A standard way to computationally find a minimum is gradient descent.
Start at some (possibly random) point $ \overrightarrow{p}=(x,y)^T $ and move downwards, i.e. in negative gradient direction. The stepsize $\lambda$ should be controlled or small enough. When a Loss function is optimized in Machine Learning context $\lambda$ is also called the Learning Rate.
The update equation
$$ \overrightarrow{p_{i+1}}= \overrightarrow{p_{i}} - \lambda \cdot \nabla f(x,y)$$is then iterated until the norm of the gradient is below some threshold.
Write down the update equations for the two functions in a)!
For this task we use the double well potential
$$V(x) = ax^4 + bx^2 + cx + d$$with $a = 1$, $b = -3$, $c =1$ and $d = 3.514$.
We seek to find the global minimum $x_{min}$ of this function with gradient descent. (In 1D the gradient is just the derivative.)
a) Calculate the derivative of $V(x)$ and the update equation for $x$ with learning rate $\lambda$.
b) Complete the code below.
c) Test the different starting points and $\lambda$:
$$(x_0, \lambda) = (-1.75, 0.001)$$$$(x_0, \lambda) = (-1.75, 0.19) $$$$(x_0, \lambda) = (-1.75, 0.1) $$$$(x_0, \lambda) = (-1.75, 0.205)$$d) How to actually find a compromize between $(x_0, \lambda) = (-1.75, 0.001)$ and $(x_0, \lambda) = (-1.75, 0.19)$ ?
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def update2(x, a, b, c, d, lam):
x = ___
return x
def V(x, a, b, c, d):
return a*x**4 + b*x**2 + c*x + d
a = 1
b = -3
c = 1
d = 3.514
x0 = -1.75
iterations = 101
lams = np.array([0.001, 0.19, 0.1, 0.205])
losses = np.empty(shape=(iterations, len(lams)))
results = np.empty(len(lams))
for j in range(len(lams)):
x = x0
lam = lams[j]
for i in range(iterations):
losses[i, j] = V(x, a, b, c, d)
if i != iterations - 1:
x = update2(x, a, b, c, d, lam)
results[j] = x
for j in range(len(lams)):
print(100*"-")
print("Lambda: ", lams[j])
print("xmin: ", results[j])
print("Loss: ", V(results[j], a, b, c, d))
colors = {
0.001: "blue",
0.19: "red",
0.1: "black",
0.205: "orange"
}
plt.figure(figsize=(8, 8))
plt.title("Learning curves")
plt.xlabel("Epoch")
plt.ylabel("Loss V")
plt.xlim(0, iterations)
for i in range(len(lams)):
lam = lams[i]
plt.plot(range(iterations), losses[:, i], label=str(lam), color=colors[lam])
plt.legend()
plt.ylim(bottom=0)
plt.show()
plt.figure(figsize=(8, 8))
plt.title("Function V and Minima")
plt.xlabel("x")
plt.ylabel("V(x)")
xs = np.linspace(-2, 2, 100)
ys = V(xs, a, b, c, d)
plt.plot(xs, ys)
for j in range(len(lams)):
lam = lams[j]
xmin = results[j]
vxmin = V(xmin, a, b, c, d)
plt.plot(xmin, vxmin, marker='.', linestyle="None", label=str(lam), color=colors[lam], ms=10)
plt.legend()
plt.show()
Consider two 1D Normal Distributions with $\sigma^2=1$ located at $\mu_1=0.0$ and $\mu_2=2.0$. Sample N values from each of these distributions and assign class label "0" and "1" to the values ("0" for the values coming from the normal distribution at "0"). Let this be your labeled data. Learn a logistic regression model with these data. Choose N=5 and N=100.
At which location is the 50% decision for your class label beeing "0" (and "1")?
Hints:
Run and understand the example "MNIST classification using multinomial logistic regression" from scikit-learn.